Analyzing the Cause of Air Pollution in Korea

Objective

Air pollution, especially particulate matter, is increasing in severity each year.
This project aims to :
    1. Analyze different factors that affect air pollution, especially particulate matter, in Korea
    2. Analyze the claims that transportation affects particulate matter in Korea.
    3. Apply EDA (Exploratory Data Analysis) functions to represent our analysis graphically.

Problem

What are the factors that affect particulate matter in Korea?

Observation

According to WHO, transportation is increasing particulate matter globably(PM). WHO claims that road transports can contribute to 50% of PM emission in OECD countries. Note that Korea is a member of OECD.

Source: https://www.who.int/sustainable-development/transport/health-risks/air-pollution/en/

According to the Korean Particulate Matter Information Center, transportation is a main source of PM emission in Korea as well. Rather than simply blaming China for their lack of effort in mitigating pollution, Seoul should reduce particulate matter through efficient TDM (Transportation Demand Management).

Source: https://bluesky.seoul.go.kr/news-list/major-news/page/22?article=358

Starting February 15, the new “Particulate Matter Mitigation Act” introduces restrictions on transportation to mitigate PM emission in Seoul. The usage of “Level 5 Emission” Vehicles will be fined. In certain days in certain locations, only certain vehicles will be allowed to pass.

Source:https://bluesky.seoul.go.kr/finedust/emergency_reduction_measures

Hypothesis

Transportation is a main source of PM emission in Korea. If the new transportation law is implemented, it will lower the overall level of particulate matter in Seoul.

Data Wrangling

About the Dataset

Data Source: http://airemiss.nier.go.kr/common/downLoad.do?siteId=airemiss&fileSeq=411

This dataset provides the average air pollution emitted by different sectors in 2015.This dataset provides insight regarding emission by location, industry, and the type of fuel used. This dataset was used because dataset after 2015 were not availble.
     - The dataset is provided by the Korean Ministry of Enviornment.
     - This dataset was be used to analyze the effect of transporation in Korea.

Dataset

Raw dataset as downloaded from its source.

시도 시군구 배출원대분류 배출원중분류 배출원소분류 연료대분류 CO NOx SOx TSP PM10 PM2.5 VOC NH3 BC
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 B-A유 981 9797 40785 137 126 120 110 157 1
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 B-B유 29 119 205 6 5 2 9 5 NA
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 B-C유 6268 69363 301648 3048 2795 1436 2016 1003 14
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 경유 4860 19439 350 138 126 81 243 778 8
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 등유 633 2279 11 5 5 4 32 101 NA
서울특별시 종로구 비산업 연소 상업 및 공공기관시설 기타 LPG 1020 4948 22 16 16 16 268 58 6

Translating

All nonnumeric columns were changed to factors.
The summary function reveals that around 1/3 of the data regarding pollution are missing. Column PM2.5 has 17005 missing values out of 44763; 38% of the data is missing. Due to the high variance as seen from the summary below, replacing the missing values will produce inaccurate results. Thus, rows with missing values for PM2.5 is removed.

         Area        Municipality                Sector     
 경기도    : 8196   동구   :  926   도로이동오염원  :12142  
 경상북도  : 4328   중구   :  896   제조업 연소     : 7461  
 경상남도  : 4324   서구   :  878   비산먼지        : 5333  
 전라남도  : 3963   남구   :  848   비도로이동오염원: 4420  
 서울특별시: 3527   북구   :  682   생물성 연소     : 3964  
 충청남도  : 3123   강서구 :  365   비산업 연소     : 3395  
 (Other)   :17302   (Other):40168   (Other)         : 8048  
     Industry                    Type of Industry   Fuel Type    
 기타    : 6568   기타                   : 3807   기타   :16595  
 truck   : 2887   소형                   : 2834   경유   :11103  
 sedan   : 2790   중형                   : 2363   LNG    : 5075  
 건설장비: 2268   경형                   : 1530   LPG    : 4259  
 van     : 1831   대형                   : 1487   휘발유 : 3874  
 분뇨관리: 1775   가구 및 기타제품 제조업:  894   (Other): 3641  
 (Other) :26644   (Other)                :31848   NA's   :  216  
      PM10              PM2.5         
 Min.   :       1   Min.   :       1  
 1st Qu.:      21   1st Qu.:      17  
 Median :     190   Median :     147  
 Mean   :    8286   Mean   :    3560  
 3rd Qu.:    1852   3rd Qu.:    1189  
 Max.   :24349414   Max.   :12681987  
 NA's   :16622      NA's   :17005     

Summary of PM10

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
       1       21      190     8286     1852 24349414    16622 

Summary of PM10

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
       1       17      147     3560     1189 12681987    17005 

Total Emission

According the to graph, 경상북도, 전라남도, and 경기도, and 충청남도 are the top polluters for particulate matter. In fact, Seoul (서울특별시) emits 8.2 times less PM2.5 than 경상북도. This is most likely due to the heavy industries in those regions. Perhaps Korea should mitigate the overall PM level through more regulations in those provinces.

Top 10 PM Polluters (PM10:left, PM2.5:right)
Area TotalEmission
경상북도 44264557
전라남도 33854329
경기도 33148120
충청남도 28649572
경상남도 14786142
강원도 11869281
충청북도 11015598
전라북도 9876664
서울특별시 9162547
인천광역시 8291512
Area TotalEmission
경상북도 21255264
전라남도 16140435
충청남도 13844725
경기도 10836347
경상남도 6107273
강원도 5176843
충청북도 4489802
전라북도 3281996
울산광역시 2986770
바다 2859385

Since the act affected mostly Seoul, the dataset was trimmed to include only Seoul (서울특별시).

         Area        Municipality               Sector         Industry   
 서울특별시:3527   강남구  : 160   도로이동오염원  :1172   기타    : 608  
 강원도    :   0   서초구  : 160   제조업 연소     : 609   sedan   : 277  
 경기도    :   0   송파구  : 159   생물성 연소     : 348   truck   : 265  
 경상남도  :   0   강서구  : 153   비산먼지        : 320   건설장비: 225  
 경상북도  :   0   영등포구: 150   유기용제 사용   : 293   van     : 197  
 광주광역시:   0   금천구  : 146   비도로이동오염원: 286   road    : 175  
 (Other)   :   0   (Other) :2599   (Other)         : 499   (Other) :1780  
 Type of Industry   Fuel Type         PM10              PM2.5        
 소형   : 296     기타   :1136   Min.   :     1.0   Min.   :    1.0  
 기타   : 280     경유   : 819   1st Qu.:    10.0   1st Qu.:    8.0  
 중형   : 250     LNG    : 605   Median :   100.5   Median :   89.0  
 대형   : 154     휘발유 : 399   Mean   :  4782.1   Mean   : 1357.7  
 경형   : 152     LPG    : 390   3rd Qu.:  1340.0   3rd Qu.:  930.2  
 특수   :  61     (Other): 171   Max.   :628239.0   Max.   :62824.0  
 (Other):2334     NA's   :   7   NA's   :1611       NA's   :1627     

Transportation

To discover the significance of transportation in PM emission in Seoul, the industry column were divided into “transportation” and non-“transportation.”

However, the data contained 3238 missing values. Rows containing them were removed, leaving 1900 rows.

[1] 3238

According to the table, 34.7% of the industries in Seoul were categorized as “transportation”. Note that this is after removing rows with missing values; the “other” category was removed.

30% of PM2.5 and 27% of PM10 were emitted from transportation.
Though the emission by transportation is not as high as 50% as indicated by WHO, transportation still remains a heavy culprit. Thus, if the Particulate Matter Mitigation Act is successful, it may yield significant results.

2018

This dataset lists the pollution level for different sectors of Seoul everyday of 2018. This dataset will be used to compare PM emission before and after the act.

Source: https://data.seoul.go.kr/dataList/datasetView.do?infId=OA-2218&srvType=S&serviceKind=1&currentPageNo=1&searchValue=&searchKey=null

측정일시 측정소명 이산화질소농도(ppm) 오존농도(ppm) 이산화탄소농도(ppm) 아황산가스(ppm) 미세먼지(㎍/㎥) 초미세먼지(㎍/㎥)
20180101 강남구 0.033 0.010 0.6 0.006 34 22
20180101 강남대로 0.040 0.007 0.8 0.006 NA 17
20180101 강동구 0.038 0.010 0.7 0.005 48 24
20180101 강변북로 0.033 0.008 0.6 0.005 48 15
20180101 강북구 0.026 0.018 0.6 0.004 38 18
20180101 강서구 0.036 0.012 0.7 0.004 NA 13

Only the columns date, sector, PM10, and PM2.5 were kept. Since the 2019 dataset contains values only till July 1st, the 2018 dataset was parsed to match accordingly. Date was changed to date object. Sector was factorized. Rows with missing values were eliminated since air pollution can change drastically day to day. The pollution level of the 46 districts were averaged because this project aims to measure to overall PM level of Seoul as a whole.

After eliminating the missing values, there were 128 rows left.

      date                 sector          PM10            PM2.5       
 Min.   :2018-01-01   강남구  : 128   Min.   :  4.00   Min.   :  1.00  
 1st Qu.:2018-02-01   강동구  : 128   1st Qu.: 34.00   1st Qu.: 17.00  
 Median :2018-03-07   강변북로: 128   Median : 47.00   Median : 25.00  
 Mean   :2018-03-15   강북구  : 128   Mean   : 49.71   Mean   : 28.35  
 3rd Qu.:2018-04-27   공항대로: 128   3rd Qu.: 63.00   3rd Qu.: 37.00  
 Max.   :2018-07-01   관악구  : 128   Max.   :154.00   Max.   :127.00  
                      (Other) :4155                                    

The first graph describes the daily change of PM10 and PM2.5 level in Seoul in 2018. The graph is messy; furthermore, the lack of variance in .
The second graph describes the weekly average of PM emissions level in 2018.

2019

This dataset lists the pollution level for different sectors of Seoul from July 1st, 2018 to July 1st, 2019.

Source: https://data.seoul.go.kr/dataList/datasetView.do?infId=OA-2218&srvType=S&serviceKind=1&currentPageNo=1&searchValue=&searchKey=null

Observations: 15,508
Variables: 8
$ 측정일시              <dbl> 20190701, 20190701, 20190701, 20190701, 201907…
$ 측정소명              <chr> "강남구", "강남대로", "강동구", "강변북로", "강북구", "강서구", "공…
$ `이산화질소농도(ppm)` <dbl> 0.013, 0.047, 0.016, 0.051, 0.010, 0.018, 0.033, …
$ `오존농도(ppm)`       <dbl> 0.045, 0.037, 0.045, 0.029, 0.049, 0.053, 0.03…
$ `일산화탄소농도(ppm)` <dbl> 0.5, 0.5, 0.5, 0.4, 0.5, 0.5, 0.5, 0.4, 0.5, 0.7,…
$ `아황산가스(ppm)`     <dbl> 0.005, 0.004, 0.003, 0.004, 0.002, 0.004, 0.006…
$ `미세먼지(㎍/㎥)`     <dbl> 35, 49, 37, 42, 40, 35, NA, 40, 33, 38, 31, 30, …
$ `초미세먼지(㎍/㎥)`   <dbl> 25, 27, 29, 31, 26, 26, NA, 28, 22, 35, 27, 23, 2…

Dataset was parsed so that the first date is January 1st, 2019. This dataset has the same structure as that of 2018; the same operations were performed. There were more unavailable data (839) in the 2019 data than 2018 data.

There were 175 rows left after eliminating rows with missing values.

      date                 sector          PM10            PM2.5       
 Min.   :2019-01-01   강남구  : 175   Min.   :  3.00   Min.   :  2.00  
 1st Qu.:2019-02-13   강남대로: 175   1st Qu.: 34.00   1st Qu.: 18.00  
 Median :2019-04-05   강동구  : 175   Median : 46.00   Median : 25.00  
 Mean   :2019-04-02   강변북로: 175   Mean   : 53.15   Mean   : 31.07  
 3rd Qu.:2019-05-19   강북구  : 175   3rd Qu.: 65.00   3rd Qu.: 37.00  
 Max.   :2019-07-01   강서구  : 175   Max.   :240.00   Max.   :155.00  
                      (Other) :6990                                    

Likewise, The first graph describes the daily change of PM10 and PM2.5 level in Seoul in 2019.
The second graph describes the weekly average of PM emissions level in 2019.

There are many factors that effect the magnitude air pollution; season is one. This project fails to capture the different variables.
It would be ideal to compare the air pollution level before and after the act. However, such would result in inaccuracy due to the inevitable variance due to season.

Raph Lee